26 research outputs found
Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits.
Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case.
Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting,
we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use.
This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios.
The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms.
Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be
implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming
Recommended from our members
Albany: Using Component-based Design to Develop a Flexible, Generic Multiphysics Analysis Code
Abstract:
Albany is a multiphysics code constructed by assembling a set of reusable, general components. It is an implicit, unstructured grid finite element code that hosts a set of advanced features that are readily combined within a single analysis run. Albany uses template-based generic programming methods to provide extensibility and flexibility; it employs a generic residual evaluation interface to support the easy addition and modification of physics. This interface is coupled to powerful automatic differentiation utilities that are used to implement efficient nonlinear solvers and preconditioners, and also to enable sensitivity analysis and embedded uncertainty quantification capabilities as part of the forward solve. The flexible application programming interfaces in Albany couple to two different adaptive mesh libraries; it internally employs generic integration machinery that supports tetrahedral, hexahedral, and hybrid meshes of user specified order. We present the overall design of Albany, and focus on the specifics of the integration of many of its advanced features. As Albany and the components that form it are openly available on the internet, it is our goal that the reader might find some of the design concepts useful in their own work. Albany results in a code that enables the rapid development of parallel, numerically efficient multiphysics software tools. In discussing the features and details of the integration of many of the components involved, we show the reader the wide variety of solution components that are available and what is possible when they are combined within a simulation capability.
Key Words: partial differential equations, finite element analysis, template-based generic programmin
Recommended from our members
A Vision for Talca's Future: Post-Earthquake Reconstruction Redevelopment Studio
In September 2012, SERVIU, the regional arm of the Chilean Ministry of Hotusing and Urban Development (MINVU), tasked our program with evaluating and reassessing housing reconstruction efforts in Talca, the capital city of the Maule region, in the wake of the 2010 earthquake that affected the region. To carry out this task, we focused on a case study neighborhood in Talca’s historic center, Barrio Las Heras. This document provides the synthesis of fourteen weeks working on the project, including a presentation of the context, our assessment of the strengths and weaknesses of the reconstruction efforts to date, and recommendations for Talca’s future development
Short vs. Extended Answer Questions in Computer Science Exams
Short-answer questions such as multiple-choice, true-or-false, or fillin-the-blank, are popular among instructors when designing written exams in different fields. These types of questions, especially multiplechoice, are suitable for testing a broad range of course topics, while enabling efficient marking and timely feedback for test takers. However, detractors argue that short-answer questions fail to measure high level cognitive skills and encourage surface learning approaches. In this work, we study the suitability of short and extended answer questions in written exams in University-level Computer Science courses. We present an overview of relevant literature on the topic. We discuss whether both formats can be used to assess the same set of skills; we review Computer Science instructors ’ perspectives about the use of multiple choice items in introductory programming courses; and we discuss the influence of assessment types on students ’ learning approaches
Optimal Speedup on a Low-Degree Multi-Core Parallel Architecture (LoPRAM)
Over the last five years, major microprocessor manufacturers have released plans for a rapidly increasing number of cores per microprossesor, with upwards of 64 cores by 2015. In this setting, a sequential RAM computer will no longer accurately reflect the architecture on which algorithms are being executed. In this paper we propose a model of low degree parallelism (LoPRAM) which builds upon the RAM and PRAM models yet better reflects recent advances in parallel (multi-core) architectures. This model supports a high level of abstraction that simplifies the design and analysis of parallel programs. More importantly we show that in many instances it naturally leads to work-optimal parallel algorithms via simple modifications to sequential algorithms
An Experimental Investigation of Set Intersection Algorithms for Text Searching
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we propose several improved algorithms for computing the intersection of sorted arrays, and in particular for searching sorted arrays in the intersection context. We perform an experimental comparison with the algorithms from the previous studies from Demaine, López-Ortiz and Munro [ALENEX 2001], and from Baeza-Yates and Salinger [SPIRE 2005]; in addition, we implement and test the intersection algorithm from Barbay and Kenyon [SODA 2002] and its randomized variant [SAGA 2003]. We consider both the random data set from Baeza-Yates and Salinger, the Google queries used by Demaine et al., a corpus provided by Google and a larger corpus from the TREC Terabyte 2006 efficiency query stream, along with its own query log. We measure the performance both in terms of the number of comparisons and searches performed, and in terms of the CPU time on two different architectures. Our results confirm or improve the results from both previous studies in their respective context (comparison model on real data and CPU measures on random data), and extend them to new contexts. In particular we show that value-based search algorithms perform well in posting lists in terms of the number of comparisons performed